The Analogy between Mechanical Translation and Library Retrieval
نویسندگان
چکیده
Any analogy made between library retrieval and mechanical translation is usually made by assimilating library retrieval to mechanical translation. We desire to draw the converse analogy; that is, to assimilate mechanical translation to library retrieval. To do this, mechanical translation procedures must be generalised and made interlingual, until they become as general as library retrieval procedures already are. This generalisation can be made if the mechanical translation procedure is based on a thesaurus. The nature of a thesaurus is discussed in Section 3. This type of procedure has already been used for library retrieval, but not for M.T.; the use of a thesaurus for both fields enables a new, very general field to be exactly defined, namely the field of semantic transformation. This field would have application to library retrieval, mechanical translation, and probably also to mechanical abstracting. The purpose of this paper is to develop the application of this generalized procedure to mechanical translation, referring also to its use for library retrieval. For this purpose, an analytic examination of the translation procedure is required, as linguists object to the analogy that we are making by asserting that a library retrieval type of procedure will not translate syntax. It is asserted that a generalised mechanical translation procedure cannot translate grammar and syntax as these do not correspond between different languages. There is a general answer, and a particular one, to this criticism. The general answer is that present procedures for translating between different pairs of languages generate such complexity that they do not form an adequate basis for future M.T. research. The experimental work done by workers in the U.S.S.R. is examined. The particular answer is that since recent mechanical translation experiments using a thesaurus show, contrary to expectation, that this method can interlingually translate semantic meaning, it seems not impossible that, again contrary to expectation, it can be used to translate syntax. Such an extension is suggested by the linguist M. A. K. Halliday. He defines the syntactic operators of a source language in terms of a set of interlingual questions. This procedure is criticised. M. MASTERMAN, R. M. NEEDHAM, and K. SPÄRCK JONES Cambridge Language Research Unit, Cambridge, England. 918 The Design of New Systems AREA 5 A generalised translation procedure, using a thesaurus, is related to the semantic problems of mechanical translation. A thesaurus is defined. Recent work done by the Cambridge Language Research Unit is described to illustrate this procedure. Experiments, done also in the C.L.R.U., using the same procedure for library retrieval, are described. The result, a conception of a procedure of generalised semantic transformation, is considered. This semantic transformation procedure is extended to cover syntax. The questions used by Halliday can be turned into thesaurus heads. Some examples of interlingual translation of syntactic form are given. Research on these lines is continuing. If this method of generalised mechanical translation proves feasible, M.T. becomes straightforwardly an extended case of generalised retrieval. Proposal to create a single general theoretic field of semantic transformation, with application to library retrieval and to M.T. Many documentalists have insisted that there is an analogy between mechanisable procedures for retrieving documents and procedures used in mechanical translation (M.T.). The analogy between the two has usually been drawn, however, by assimilating library retrieval to translation; not the other way round. A coded library classification has been envisaged as an exact and interlingual library language. Any request for information, made in a particular language, must be translated into the interlingua, and also coded, if the retrieval procedure is to be mechanical (1). We wish to draw the analogy conversely: that is, by assimilating interlingual mechanical translation to retrieval. Now, in the present state of research this analogy can only be drawn at all precisely between one form of library retrieval procedure, and one form of mechanical translation procedure; these two analogous procedures are those, in each field, which make use of a thesaurus. The proposal that an improved type of library retrieval procedure could be devised, using a thesaurus, of the type of Roget’s famous Thesaurus, instead of a term classification, has already been made by American workers in this field (2,3). The proposal that semantic meaning can be translated using a thesaurus was first made by the Cambridge Language Research Unit (England), at the Second International Conference for Machine Translation (4,5,6). We propose, then, that a conceptually based, thesaurus type of language classification should be used for a completely generalised retrieval procedure, this classification procedure being, by its nature, interlingual. The development of this procedure makes possible the definition of a general theoretic field of semantic transformation. Of this field, a well-defined mathematical model can be made (7). MASTERMAN and JONES Mechanical Translation and Information Retrieval 919 Surprisingly enough, the proposal that such a general field should be created seems far more revolutionary to mechanical translation specialists than to documentalists specialising in library retrieval. Translation specialists, and, in particular, linguists deny even the possibility of the analogy by maintaining that any classification of language based on a thesaurus can, at best, only hope to translate semantic meaning, whereas language is primarily a system of grammar and syntax; and both of these are notoriously monolingual. It could be said, indeed, that a library classification is like a non-grammatical language and that a thesauric library retrieval procedure could therefore hope to retrieve from it. But it is obvious, so the argument runs, that any mechanical translation procedure, before it starts dealing with subtle questions of semantic ambiguity, must deal with crude questions of how to translate grammatical and syntactic form; and these are both notoriously monolingual. Since, therefore, grammar and syntax cannot be translated by an interlingual thesaurus procedure, the analogy we wish to draw falls to the ground: it has no application to any procedure for mechanical translation. The object of this paper is to refute this criticism by showing how a type of retrieval procedure, based on a thesaurus already being used for the experimental translation of semantic meaning, might also be extended so as to translate grammar and syntax. It is only by showing the procedure in action that we can hope to make clear what seems to us this most fundamental and important analogy between library retrieval and mechanical translation; we hope to show the nature of the generalised procedure by considering how it can deal with the particular problems of one of the fields in question, namely M.T. And this is all the more necessary in that the field of mechanical translation, unlike that of library retrieval, has not hitherto been approached at all from this point of view. 1. Application of the method to M.T. On July 10th, 1957, M. A. K. Halliday read a paper, to the Cambridge Language Research Unit, and later in a developed form, to the International Congress of Linguists, held in August, 1957, in Oslo, in which, speaking as a descriptive linguist, he described a method which might be used to carry out an interlingual analysis of the syntax of a language (8,9). This method was nicknamed the Twenty Questions Method of Analysis. Before discussing the method, however, we must give a provisional reply to those M.T. workers who deny the existence of an analogy between the mechanical translation and retrieval fields. These may ask, “Why attempt an interlingual translation between languages when we know that the grammar 920 The Design of New Systems AREA 5 and syntax of different languages do not correspond?” They may also ask, “Since it is mechanical translation of technical material which is urgently required in order to make scientific information more generally available, why not have, as the U.S.S.R. mechanical translation workers have, a set of twolanguage programmes, to translate from, e.g., Italian into French, or from Chinese into Russian, using for any particular text the appropriate programme?” The answer to these questions, still keeping for the moment within the M.T. field, is that those who use such an approach, constructing a separate programme, to be stored by the machine, for every pair of languages, fail to consider the complexity which the method itself generates. Only one group of workers has extensively tried this method out: the Mechanical Translation Research Group of the U.S.S.R. Academy of Sciences. The project is described in an informative recent paper by I. K. Belskaya (10). This paper explicitly sets forth the restrictions on translation necessary to limit the complexities generated by the method itself. These are (1) severe limitation of the input text: only mathematical texts were used, the translation being from Russian into English; and the U.S.S.R. group only at present envisages mechanical translation of scientific texts; (2) limitation of vocabulary: in order to limit the number of multiple meanings required for successful dictionary entries, a separate entry was used for each whole word—the attempt to economise on storage space by dividing words into “chunks,” or sub-words (11) was abandoned; (3) multiplication of dictionaries: different dictionaries were required for all the different fields, even when translating between the same pair of languages. These experiments show that a mechanical translation programme constructed on the Russian model does not straightforwardly translate between two languages. What such a translation programme does, when used with, e.g., a technical mathematical dictionary and a general dictionary containing the common words of the language, is successfully to translate English mathematical texts into Russian. This is a tremendous technical achievement. But it is inadequate as a directive for future research. The failures, cited by Belskaya, of attempts by cryptographers and logicians to find a common basis, statistical or mathematical, to language, might indeed cause us to abandon the goal of interlingual translation. But we cannot abandon the attempt to achieve intertextual translation. If we cannot feed into a computer and translate, from a single source language, e.g., a novel, a philosophical treatise, a mathematical system and a botanical paper, without using separate programmes and dictionaries, we are not translating between pairs of languages. We are merely translating between pairs of texts. And mechanical translation on this basis is not a commercial prospect. MASTERMAN and JONES Mechanical Translation and Information Retrieval 921 If we reconsider the Russian experiments, therefore, with the necessity for intertextual translation in mind, we are tempted to ask, “Can we at once have a more general approach to the problem?” This question seems all the more appropriate when we find that the U.S.S.R. group themselves think that a more general attempt to translate syntax might be successful. Belskaya says: Special experiments were made in order to find out whether the same grammatical programme can be applied to a text having as little to do with mathematics as, say, an article from The Times, or a page from Charles Dickens. These experiments proved the success of our ideas on the possibility of having a universal grammatical programme for the machine translation of any two languages. Our general principles have withstood another test: they were extended to cover machine translation from languages differing from English in structure as much as Japanese, Chinese, and German. These experiments having been successful, the principles (underlying the Russian grammar and syntax programme) may be considered as basic in the solution of machine translation problems. Thus even the U.S.S.R. group, whose approach is strictly particularised and inductive, admit that there may be general ascertainable principles underlying the mechanical translation of grammar and syntax. The next object, then, of linguists associated with machine translation, ought to be the discovery and development of these principles, rather than further experiments on particular texts. We propose that this research should be pursued by substituting for the particularised methods of linguistic analysis at present in use among workers on M.T. the completely generalised methods at present in use in library retrieval; that these, having been given thesauric linguistic application, should be put on a machine, and the results examined. Such a method, which is essentially algorithmic and deductive, does not, of course, invalidate the step-by-step method of inductive generalisation, at present being used in U.S.S.R. But the light that it throws upon the whole process of semantic transformation, and the simplifications which can be attained by means of it, make it in our view a preferable basis for the next stage of research. 2. A suggested interlingual analysis of syntax That M.T. research could be thus generalised is the opinion already of one linguist, M. A. K. Halliday. We must next, therefore, examine and criticise the method he suggests for the interlingual mechanical translation of grammar and syntax, before further considering the problem of whether fully interlingual and intertextual mechanical translation of scientific texts is possible. Halliday’s method was first to make a strictly monolingual analysis of the 922 The Design of New Systems AREA 5 input language. He then made a further interlingual analysis of the language. For this interlingual analysis he does not recommend a generalised transfer grammar, of the kind developed by the American descriptive linguists, Z. Harris and N. Chomsky (12,13). He recommends using a more direct analytic method. This owes much to 19th century historical linguists. But Halliday’s analysis, unlike theirs, is not evolutionary. First, he makes a rigid distinction between types of chunk, the operators of a language, and the arguments. (Roughly, the functions of operators are dealt with by grammar books; those of arguments, by dictionaries.) The operators are identified by their relation, positive or negative, to a number of categories (provisionally about 60). The arguments are then classified by referring to groupings of these systems (14). Basically, therefore, Halliday makes first a monolingual grammar, and then an interlingual analysis of each language, the latter being quite distinct from the former. The monolingual grammar resembles those of descriptive linguists, except that it refers only to operators; the arguments are later defined by referring to the operators. The interlingual analysis, the key to the whole method, demands reference to extralinguistic contexts; only after these have been ascertained are the operators related to the arguments. The relation of any operator to the extralinguistic context is determined by asking questions, the answer to which can be “Yes,” “No,” “Both,” “Neither.” This procedure resembles that of the game “Twenty Questions,” from which the method derives its name. The two methods differ, however, in that, for the linguistic analysis, in most cases, the answer to one question does not influence the next. The interlingual analysis may proceed as follows. Take, for example, the French operator la. A normal grammatical description would classify this as either the feminine definite article, or the feminine accusative pronoun. We assume that la has already been subjected to a monolingual French analysis giving, e.g., gender. We now carry out the interlingual analysis: we do not ask “Does la belong to any gender system?” because it is notorious that the gender systems of different languages do not correspond. Therefore we simply ask: “Can la tell us anything about sex?” By this change of question we refer, not to the intralinguistic context (i.e., that of French), but to the far more general extralinguistic context (i.e., that of the human race divided into sexes). English has no genders, French has two, German three, Icelandic six, but English, French, Germans, and Icelanders alike fall into communities of only two sexes. Therefore the answer to our last question is “Yes.” We may then ask: “Does la refer to animate or inanimate objects?” The answer is “Both.” To the question “Does it apply to present or non-present time?” the answer is “Neither.” And so on. MASTERMAN and JONES Mechanical Translation and Information Retrieval 923 Now it is clear that, even from the pure linguist’s point of view, Halliday’s suggestion is of great research interest, since what he proposes is to use the precise and elegant analytic methods of contemporary linguistics to analyse, both monolingually and interlingually, the context grammar of particular texts. (These analytic methods, as is known, depend on being able to break up the older grammatical units, such as noun, verb and the rest, into weaker but more precisely definable units, special to each language, from which, by referring to the intralinguistic context-grammar of a text, the older type of unit, can, where it is required for that particular language’s analysis, be built up.) In order to extend this method to make it apply to an interlingual grammar based on extralinguistic context analysis, it is evident that Halliday must take seriously the analogy, to which older linguists have paid nothing more than lip service, between intralinguistic context and extralinguistic context, and the way that each might be used to build up grammar and syntax. And, from the pure linguistic point of view, this is a very interesting thing to do. But if we consider his interlingual analysis from the point of view of mechanical translation rather than from that of linguistics, it is clear that it has serious defects. These are (1) that the monolingual analysis is too complicated a way of obtaining the list of operators of an input language; a first approximation to these could be obtained with far less trouble by consulting a grammar book, and then, by applying the procedures, to find out where and why the translation had turned out wrong; (2) that though the method analyses, it does not translate. For mechanical translation purposes it must be turned from a method of analysis into a translating procedure; (3) that the method is essentially not linguistic at all, but logical. Therefore logical sophistication, rather than linguistic scholarship, should be used to make the question system more economical. 3. A procedure for the translation of semantic meaning, using a
منابع مشابه
Steady Flow Analysis and Modeling of the Gas Distribution Network Using the Electrical Analogy (RESEARCH NOTE)
The mathematical modeling of a gas network is a powerful tool in order to identify the behavior of system under the different conditions. The modeling can be performed both for the steady state and unsteady state conditions. It is possible to use the fluid flow basic governing equations or the electrical analogy concept for developing the model. The second approach provides a simpler and more r...
متن کاملComparison of Information Retrieval Capabilities in Library Software of Payam, Voyager and Aleph
The purpose of this study was comparing Information Retrieval Capabilities in Web-based Library Software of Payam, with Voyager and ALEPH. A checklist designed and included six main trait for evaluation and comparing 73 scales. Data collected by experts' observing of the software's OPAC. Data analyzed by the descriptive statistics methods. Findings shows the preferences in search capabilities i...
متن کاملUsing Interactive Search Elements in Digital Libraries
Background and Aim: Interaction in a digital library help users locating and accessing information and also assist them in creating knowledge, better perception, problem solving and recognition of dimension of resources. This paper tries to identify and introduce the components and elements that are used in interaction between user and system in search and retrieval of information in digital li...
متن کاملKaren Spärck Jones (1935-2007)
AI. Although she died on 4 April 2007, she worked up until a week before her death, and her major and most lasting contributions will almost certainly be her original PhD thesis and the inverse document frequency (idf) measure of the relevance of terms. 1 The latter is the notion that a document is relevant not only because key terms are frequent in it but because those terms are infrequent in ...
متن کاملModeling the Circle of Willis Using Electrical Analogy Method under both Normal and Pathological Circumstances
Background and objective: The circle of Willis (COW) supports adequate blood supply to the brain. The cardiovascular system, in the current study, is modeled using an equivalent electronic system focusing on the COW.Method: In our previous study we used 42 compartments to model whole car- diovascular system. In the current study, nevertheless, we extended our model by using 63 compartments to m...
متن کامل